Foundations

This analysis came out of a colloration between the School of Design at Penn and the City of Louisville as part of a practicum class. With a dataset of 5 million unique incidents of traffic slowing or stopping from the city of Louisville, we attempt to determine whether or not the structure of the street network corresponds to troubles during the typical morning commute. Mapping every kink in vehicle flows during the month of June, we can see that certain hot spots do concentrate around major roads—unsurprisingly. Yet this does not tell us about the structure underpinning this phenomenon. Before producing a predictive model, in the interest of extracting valuable features from the urban fabric, we want to measure the relationships between streets and test how these measures associate with traffic congestion.

Jams

Getting from point A to point B in a city does not required hundreds of turns; drivers can simplify things by hopping on an expressway then finishing the journey on back roads. The issue is an of topology. Topology, the study of spatial relations, is itself the study of how things are interact, disregarding much of what happens in between interactions. Time on the expressway is less important than the number of turns. Topology allows us to understand the fragility of urban networks (what would happen if this road was closure?), the benefits of street arrangements (is a grid or a mess of streets better?), and countless other questions about cities and their function. Topology is key to understanding the twin issues of efficiency and robustness, and thus how to trade off costs of redundancy with risks of failure.

The following tests whether graph theory can help us better understand urban networks. This study begins by transforming data grounded in geography, or positional data, into relational data, with information regarding not just where a thing is but also how it relates to other things—in this case, a street and other streets. After recasting streets as nodes in a network, and their intersections as links between them, it then takes on the challenge understanding how important each street is to the functioning of the network as a whole. A naive approach may extract nodes—using ArcGIS or QGIS—at intersections, but this would be akin to taking connections in social network as nodes and the friends themselves as links. Streets interact at their intersections, but, at least to this urbanist, they are the focus—the locus of activity and the scaffold for urban life. We move through and to streets, not intersections. We also use network analysis to link peripheral suburban communities to the urban core using optimal routing calculations. Taking each commuter in the metropolitan area and guessing his or her commute, it turns out, captures a great deal of the variance seen in traffic congestion.

Features

How critical or central is a stretch of pavement to the functioning of the city? Centrality can consist of betweenness, closeness, and degree. If we list every pair of vertices in a network along with the shortest path between them, collating the number of times any given node is used along those paths will give you its betweenness. Considering simply the distance of one vertex to all others will give you closeness. For a vertex, counting the number of edges attached will give you its degree. For a street, this is just how many other streets touch it. By way of example, the following shows the various measures for the city of Louisville. We can see that betweenness values cut across the city while closeness values those nearest the core. In spatially constrained networks, then, the center in terms of closeness is also likely to be the center, close to the action. The center in terms of betweenness is likely to involve geographic constraints: in a city bisected by a river, the only bridge is between all the nodes on one side and all the nodes on the other. We can calculate these measures for each segnment in Python, with the osmnx package, and incorporate them into predictions of jams.

Measures

We then estimate usage by using commuting data from the Census, which provides origins and destinations and is accessible through the lehdr package in R. Using the network we derived earlier, we can then impute shortest paths—incorporating factors like road speeds—between these points. The Census estimates all commutes between tracts, so as one additional layer of imputation, we sample random points with those tracts, based on the numbers of jobs travelling from point a to point b, connecting them via a shortest path. This means that each multiline is a single commute, rather than a bundle of commutes and it adds noise to the imputation, as points from different areas of a tract—rather than its centroid—will follow different points (a noise that we believe usefully hedges against against a flawed imputation by guessing at an array of them). All of this work can be accomplished with the sf package in R, though osmnx also provides a quick and easy process.

Commutes

From there, we can bundle all usage by street segment to create weights, using the stplanr package in R. These weights constitute an input into the regression, representing where we believe the network should be be busy. Looking at the map, we can already see the some of the most jammed segments are those that are thickest and thus have the most expected commutes.

Estimates

The left side of the regression in our analysis is the number of jams according to Waze, a popular routing application which provide this data to local governments. There are at least two potential use cases for such a prediction or model: if topology can adequately predict congestion on a road, it may inform policies used to determine the impacts of additions or subtractions (closures) from the road network, and if time-space modelling can predict day-to-day or hour-to-hour variations, temporary adjustments could also be assessed.

Observed congestion

We take the log of our dependent variable to combat its skew. We then line up the right with a series of topological and morphological—how wide, how long, how fast—traits about each street segment and use that predict the number of jams that it will experience in a week.

Models

The twin challenges here are spatial and spatio-temporal modelling: for transportation planning, a model to understand flows beyond daily fluxuations is better, while other use cases might demand predictions around adjustments to the network. The hypothesis that we are testing here is that network centrality—or variious centralities—matter in the prediction of traffic congestion. It turns out that using commutes and a few other road characteristics explains about 48% of the variation in jams across the city during an average week. This is a simple linear model and it is systematically biased, showing stark patterns in its residuals.

Predicted congestion (space)

Fitting a linear mixed effects curve, we can predict on each street segment at each our of the day. It too shows bias at certain values, notably at low values. This model, though, predicts better on a difficult task because it accounts for random effects, leveraging the history of each segment to predict its future.

Predicted congestion (space + time)